The Annals of Statistics 

2007, Vol. 35, No. 1, 41-69 

DOI: 10.1214/009053606000001091 

© Institute of Mathematical Statistics. 2007 



ASYMPTOTICS FOR SLICED AVERAGE VARIANCE 
ESTIMATION 1 

By Yingxing Li and Li-Xing Zhu 

Cornell University and Hong Kong Baptist University 

In this paper, we systematically study the consistency of sliced 
average variance estimation (SAVE). The findings reveal that when 
the response is continuous, the asymptotic behavior of SAVE is rather 
different from that of sliced inverse regression (SIR) . SIR can achieve 
y/n consistency even when each slice contains only two data points. 
However, SAVE cannot be y/n consistent and it even turns out to be 
not consistent when each slice contains a fixed number of data points 
that do not depend on n, where n is the sample size. These results 
theoretically confirm the notion that SAVE is more sensitive to the 
number of slices than SIR. Taking this into account, a bias correc- 
tion is recommended in order to allow SAVE to be \fn consistent. In 
contrast, when the response is discrete and takes finite values, \fn 
consistency can be achieved. Therefore, an approximation through 
discretization, which is commonly used in practice, is studied. A sim- 
ulation study is carried out for the purposes of illustration. 

1. Introduction. Dimension reduction has become one of the most im- 
portant issues in regression analysis because of its importance in dealing with 
problems with high-dimensional data. Let Y and x = (x±, . . . ,x p ) T be the 
response and p-dimensional covariate, respectively. In the literature, when Y 
depends on x = (x±, . . . ,x p ) T through a few linear combinations B T x of x, 
where B = . . . ,flk), there are several proposed methods for estimating 
the projection directions .B/space that is spanned by B, such as projec- 
tion pursuit regression (PPR) [11], the alternating conditional expectation 
(ACE) method [1], principal Hessian directions (pHd) [17], minimum average 
variance estimation (MAVE) [23], iterated pHd [7] and profile least-squares 
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estimation [10]. All of these methods estimate the projection directions B 
or the subspace that is spanned by B when B is contained within the mean 
regression function. 

For more general models in which some fa are in the variance component 
of the model, two estimation methods — sliced inverse regression (SIR) [16] 
and sliced average variance estimation (SAVE) [5, 9] — have received much 
attention. SIR is based on the estimation of the conditional mean and SAVE 
on the estimation of the conditional variance function of the covariates given 
the response, the inverse regression. The aim of these two methods is to 
estimate the central dimension reduction (CDR) space that is defined as 
follows. Suppose that Y is independent of x, given B T x, which is written as 
VJL x\B T x, where JL stands for independence and B = {fa, . . . ,fa) is an 
unknown p x k matrix, the columns of which are of unit length under the 
Euclidean norm and mutually orthogonal. A dimension reduction subspace 
is defined as the space that is spanned by the column vectors of B and a 
CDR subspace is the intersection of all of the dimension reduction subspaces 
that satisfy conditional independence (see [3, 4]). The CDR subspace is 
still a dimension reduction subspace with the notation S y \ x under certain 
regularity conditions. SIR and SAVE are used to estimate S y \ x . If we let z = 

]/2 1/2 

T, x (x — E(a:)) be the standardized covariate, then S y \ z = Y, x S y \ x (see [4] 
for details). Hence, the estimation can be carried out equivalently for the pair 
of variables (y,z). For convenience, we first use the standardized variable z 
to study the asymptotic behavior. In practice, the sample covariance matrix 
and the sample mean must be estimated and thus the results involving the 
estimated covariate z = T, x (x — x) will be reported as corollaries, where 
T, x and x are the sample covariance matrix and sample mean of the Xi's, 
respectively. 

Denote the inverse regression function by E(z|Y = y) and the conditional 
covariance of z given y by ^ z \ y '■= E((z - E(z\Y))(z — E(z|Y)) T |Y = y). 
SIR estimates the CDR subspace via the eigenvectors that are associated 
with the nonzero eigenvalues of the covariance matrix Cov(E(z|Y)); SAVE 
estimates it via the eigenvectors that are associated with the nonzero eigen- 
values of the covariance matrix E((J p — S 2 |y)(/p — H z \y) t )- For SIR esti- 
mation, we need the linearity condition 





For SAVE estimation we also assume that 



(1.2) 



Cov(z\P Sylz z) = I p -P s , 



where Pt.\ stands for the projection operator with respect to the standard 
inner product. 
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It is worth pointing out that the study of SAVE should receive more 
attention, as several papers have revealed that SAVE is more comprehensive 
than SIR: under regularity conditions, the CDR space of SAVE actually 
contains that of SIR (see [6, 24]). In particular, SIR will fail to work in 
symmetric regressions with y = f(B T x) + e, where / is a symmetric function 
of the argument B T x. Therefore, theoretically, SAVE should be a more 
powerful method than SIR under regularity conditions to estimate the CDR 
space. 

Clearly, the primary aim is to estimate either Cov(E(z|Y)) or E[(/ p — 
E Z |y)(7p — Li [16] proposed a slicing estimation that involves a very 

simple and easily implemented algorithm to estimate the inverse regression 
function, in which the slicing estimator is the weighted sum of the sample 
covariances of Zj's in each slice of y,'s. He also demonstrated, by means of 
a simulation, that the performance of the slicing estimator is not sensitive 
to the choice of the number of slices. Zhu and Ng [27] provided a theoret- 
ical background for Li's empirical study and proved that ^Jri consistency 
and asymptotic normality hold provided the number of slices is within the 
range \fn to n/2. In other words, y/n consistency can be ensured when each 
slice contains a number of points between 2 and y/n. The only thing that 
is affected by different numbers of slices is the asymptotic variance of the 
estimator. A relevant reference is Zhu, Miao and Peng [26]. These results are 
somewhat surprising from the viewpoint of nonparametric estimation. Note 
that, accordingly, the number of slices is similar to a tuning parameter such 
as, say, the bin width in a histogram estimator or, more generally, the band- 
width in a kernel estimator. We can regard a kernel estimator as a smoothed 
version of the slicing estimator with moving windows. However, as we know, 
to ensure y/n consistency of the kernel estimator, the bandwidth selection 
must be undertaken with care. Zhu and Fang [25] proved the asymptotic 
normality of the kernel estimator of SIR when the bandwidth is selected 
in the range n _1//2 to ra -1//4 , which means that in probability, each window 
must have n s points for some 5 > 0. Therefore, for SIR, Li's slicing estima- 
tion has the advantage that a less smoothed estimator is even less sensitive 
to the tuning parameter. 

The problem of whether SAVE has similar properties to SIR is then of 
great interest. Empirical studies have examined this and there is a general 
feeling that SAVE may be more sensitive to the choice of the number of 
slices than SIR. Cook [5] mentioned that the number of slices plays the role 
of tuning parameter and thus SAVE may be affected by this choice. The 
empirical study of Zhu, Ohtaki and Li [28] was consistent with the sensitivity 
of SAVE to the selection of the number of slices, but no theoretical results 
have been produced to show why and how the number of slices affects the 
performance of SAVE. 
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In this paper, we present a systematic study of this problem and obtain 
the following results. 

1. When Y is discrete and takes a finite value, SAVE is able to achieve ^fn 
consistency. 

2. For continuous Y, the convergence of SAVE is almost completely different 
from that of SIR. Let c denote the number of data points in each slice. 
When c is a fixed constant, SAVE is not consistent. When c ~ n b with 
b > 0, although the estimator for SAVE is consistent, it cannot be ^Jn 
consistent. 

3. A bias correction is proposed to allow the SAVE estimator to be y/n con- 
sistent. Since in practice, the discretized approximation is commonly used 
in the literature, we present asymptotic normality in a general setting. 

Note that Cook and Ni ([8], Section 7) investigated the asymptotic be- 
havior of the slicing estimator of the SAVE matrix and reported a result 
that is relevant to Theorem 2.3 in this paper. Another relevant paper is [12]. 

The rest of this paper is organized as follows. Section 2 contains an in- 
vestigation into when the estimator is y/n consistent. Section 3 contains the 
bias correction and an approximation via discretization. Section 4 reports a 
simulation study and the performances of SIR, SAVE and the bias-corrected 
SAVE are considered. The proofs of the theorems are given in the Appendix. 

2. Asymptotic behavior of the slicing estimator. As matrix operations 
are involved, we will write, unless stated otherwise, AA T = A 2 , where A is 
a square matrix. We first describe the slicing estimator for the SAVE matrix 

E(lp - £>Z\y) 2 - 

Suppose that {(zi,y±), . . . , (z n ,y n )} is a sample. Sort all of the data 
(zi,yi),i = 1,2,..., ra, according to the ascending order of tji. Define the 
order statistics ym < y^ 2 ) < • • • < V( n ) an d for every 1 < i < n, let be 
the concomitant of yu\ . For any integer c, we group every c data points and 
introduce a double subscript (h,j), where h refers to the slice number and 
j refers to the order number of an observation in the given slice. Then 

1 c 

y(h,j) = y( c (h-i)+j), z(h,j) = 3(c(fe-i)+j), z w = :E z (M' 

The number of data points in the last slice may be less than c, but the 
calculation is similar and the asymptotic results are still valid. Without loss 
of generality, suppose that we have H slices and that n = c x H . The sample 
version of the conditional variance of z given y in each slice is 

(2-1) ^) = 7^I>(M-*W) 2 - 

V ) j=i 
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The estimate of E((I p — T, z \ y ) 2 ) is defined as 

(2.2) 1 J2(I P ~ Z(h)) 2 = I p - 2-1 £ S(^) + ^(SW) 2 . 

Note that the term J p — -jjY^h=i^{h) is the same as the SIR estimator. 
Zhu and Ng [27] proved the ^Jn consistency of I p — jj J2h=i ^(^) under cer- 
tain regularity conditions. Hence, throughout the rest of the paper, we only 
investigate the asymptotic properties of A n = jj X)/^=i(^(^)) 2 j the results of 
the estimator of SAVE being presented as corollaries. Moreover, A n can be 
rewritten as 

A n = lf;(s(h)) 2 
h=i 



i H \ i \' 

^Si(^)§ (Z(M " %)) j 



r b c i-i c v-i 



L h=l 1=2 j=lv=2 u=l 



X ( z (h,v) ~ z {h,u))i z (h,v) ~ z (h,u)) J 



[nc(c - l) 2 ] 



2l-l 



For the sake of convenience, we here introduce some notation. For a 
symmetric pxp matrix L> = ((%), vech{D} = (c^ 11 ), . . . , S pl \dP' 2 \ . . . ,S- p2 \ 
. . . , S- PP ^) T is the £iE±li x x vector constructed from the elements of D. 

We now define the total variation of order r for a function. Let U n (K) be 
the collection of ra-point partitions —K < yn\ <■•■ < y( n \ < K of the closed 
interval [— K, K], where K > and n > 1. Any vector- valued or real- valued 
function f(y) is said to have a total variation of order r if for any fixed 
K>Q, 

1 n 
lim - sup £l|f(yi+i)-f(3/i)ll=0. 

For any vector- valued or real- valued function f(y), if there are a nonde- 
creasing real-valued function M and a real number Kq such that for any 
two points, say y\ and y2, both in (— oo,— Kq] or both in [Kq,+oo), 

||f(2/i)-f(y 2 )||<|M( yi )-Af(y 2 )|, 

then we can say that the function i(y) is nonexpansive in the metric of M 
on both sides of Kq. 
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2.1. When is SAVE not y/n consistent?. Let m(y) =E(z\Y = y). We 
can write z = e + m(y), where E(e|y) = 0, and then A = E[(S Z |y) 2 ] = 
E[(E(ee T |y)) 2 ]. The conditional expectation of s given y equals zero and 
more importantly, when yi are given, S{ are independent, although they are 
not identically distributed (see [14] or [27]). Analogously to A n , we denote 



A r , 



H c l-l c v-1 

Yj Yj Yj( £ (h,l) - £ (h,j))( £ (h,l) - £ (h,j)f( £ (h,v) - E(h,u)) 

h=l 1=2 j=l v=2 u=l 



x ( £ (h,v) ~ £(h,u)) 



T 



Mc-i) 2 ]- 1 . 



Let J n = A n — A n . To prove the convergence of A n , we need to investigate 
A n and J n . 

Theorem 2.1. Assume the following four conditions: 

(1) There is a nonnegative number a such that E(||z|| 8+a ) < oo. 

(2) The inverse regression function m(y) has a total variation of order 
r>0. 

(3) m(y) is nonexpansive in the metric of M{y) on both sides of a pos- 
itive number Bq such that 

M 8+a (t)P(Y >t)^0 ast^oo. 

(4) c~n fe forb>0. 

Then J n = o p (l) for any (3 such that (3 + b + max{g|^ + r, < 1. 

Remark 2.1. We note that the conditions are similar to those that 
ensure the consistency of the estimator for SIR, except for the higher mo- 
ments of z (see [27]). The y/n consistency of J n implies (3 = 0.5 and hence 
we must have 6 = 1/2 — max{g^ + r, > 0. When r is close to zero and 
all moments exist, c can be selected to be arbitrarily close to y/n. 



Theorem 2.2. Assume the following conditions: 

(1) There is a nonnegative number a such that 'E(\\z\\ ma " x { 8+a ' 12 }) < oo. 

(2) Let mi(y) = ~E(ee T \Y = y). mi(y) has a total variation of order 
n > 0. 

(3) For a nondecreasing continuous function M\{-), m\(y) is nonexpan- 
sive in the metric of M\(y) on both sides of a positive number B' such that 

M* +a/2 {t)P{Y>t)^0 ast^oo. 
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(4) Let m.2(y) = E((ee T ) 2 |j/). For a nondecreasing continuous function 
M2(-), 1112(2/) is nonexpansive in the metric of M2{y) on both sides of a 
positive number B such that 

Ml +a/4 {t)P{Y >t)^0 as t-> 00. 

(5) There exists a positive p\ such that 

Urn limsupEd^^JIIdMi (y (n) )| >d)) = o(n"^). 

(6) There exists a positive pi such that 

Jim limsupE(|M 2 ( 2/(n) )M 1 2 ( 2/(n) )|I(|M 2 (i/ (n) )| > d)) =o(n^ 2 ). 

T/ien 

(2.3) E(A,)= fi- J£Z^)A+iE[( e e T ) 2 ]+ ( cra - 1+mM{ri '^ lPl} ). 

V c(c-l)/ c 

On £/ie further assumption that c~ n fe /or 6 > 0, we /zaue 

(2.4) n"(A, - A) = 0p (l) 

/or any /? snc/i i/tai /? + 6 + max{ri, 4+ q/ 2 ' P 1 } — ^ < b, and 2/3 + 6 + 
max{2n, + 2+^73^2} < 2. 

Remark 2.2. The first three conditions in Theorem 2.2 are similar to 
those in Theorem 2.1. Condition (2) is similar to the condition for the inverse 
regression function because we deal with the conditional second moment of 
e when SAVE is applied. Condition (3) is slightly weaker than the existence 
of the (4 + a/2)th moment of Mi(-) or, equivalently, the (8 + a)th moment 
of z, as is Condition (4). Note that Condition (5) is slightly stronger than 
Af 2 (y( n )) = o p (n pl ) because we have to handle the moment convergence. It 
is well known that when the j/j follow an exponential distribution, the max- 
imum y/ n ) can be bounded by (logn) c in probability for some c> 1 (see, 
e.g., [2], Chapter 1, page 10), and when the support of yi is bounded, yi n \ is 
simply bounded by a constant. Note that for any transformation h(-) on y, 
h(y) is independent of z when B T z is given. Therefore, we could construct 
a transformation to allow the support of bounded h(y) and consider the 
(zi,h(yi)ys. However, in this paper we do not consider any transformations 
of y. 

Remark 2.3. From Theorems 2.1 and 2.2, we know that when c is a 
fixed constant, J n = o p (l), but the mean of A n is not asymptotically equal 
to A. From the proof of Theorem 2.2, we can easily see that A n does not 
converge in probability to A and therefore A n = J n + A n cannot converge 
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to A. When c tends to infinity at a rate slower than n 1//2 in Theorems 2.1 
and 2.2, the convergence rate of A n to A is slower than 1/c and therefore 
y/n consistency does not hold. This property is completely different from 
that of SIR because within this range of c, the slicing estimator of SIR is 
y/n consistent (see [27]). The second and third terms in E(A n ) provide two 
bounds, when v\ = 0, a = oo with the multiplication of y/n by E(A n ), y/n/c 
and c/y/n, that are reciprocal one to another. Although the third term is 
an upper bound, it is tight, to a certain extent. An example is provided 
by the case where y is uniformly distributed on [0,1], yu^ = i/n. With large 
probability so the third term can achieve the rate cn -1 , which means that 
in general cases, if no extra conditions are imposed, it is impossible for the 
expectation of A n to converge to A. This can be seen from the proof of 
the theorem. This is worthy of a detailed investigation and relates to the 
question of whether the slicing estimator of SAVE is y/n consistent. In the 
following subsection, we undertake a detailed study of this issue. 

— 1/2 

When the mean and covariance of x are unknown, the Zj = T, x (xj — x) 
are used to estimate the matrix E(7 P — E^y) 2 . Let T,^(h) be the sample 
covariance of the Zj's in each slice for h = 1, . . . ,H. Note that this matrix is 
location-invariant. We can assume, with no loss of generality, that the sample 
mean x = 0. Clearly, t t {h) = £ x 1/2 Z x 1 / 2 £(h)Z x 1 / 2 £ x 1/2 . To study the 
asymptotic behavior of the estimator when T, x is replaced by T, x , we first 
consider the following property. Let R = (£# — Yi X )Yi X l . By some elementary 
calculation and the well-known fact that Y>x — = O p (l/y/n), we have 

(± x - ^x)^x l [{i P + Rr l ((i P + r)~ 1/2 + ip)' 1 ] 

^(t x - ^x)^ 1 + o p (l/y/n) 

(2.6) T^tx 1 ' 2 = I P ~ \^ x \t x - E x ) + o p (l/Vn~). 

Consequently, for each h = 1, . . . , H, 

± x 1/2 ^x 1/2 tmx 1/2 t x 1/2 

(2.7) 

= ±(h) - -(Ex - £x)£z£(/i) - -tih^itx - Ea;) + o p (l/y/n~) 
and then 

H h=i 



y-V2 v 1/2 _ j 
^X ^x — J-p — 

(2.5) 

= I P - 

and similarly 
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H 

H 

2H 



1 Y,(Ip-Hh)Y 
h=l 



(2.8 



1 H , 

+ 2H ^ {Ip ~ " ^)E x l E{h) + E{h)E x \E x - 



+ Op(l/-v/n) 



1 

H 



H 



E & " S(/i)f + + O p (l/v^). 



/i = l 



We now deal with I n . Write (E x — Ex)E x 1 = A n = (a n ^j), E(h) = B n (h) = 
(bn,ij(h)) and (I p — E(h)) = C n (h) = (cn,ij(h)). \fnl n can be written as 

i H 

v^n = S E K A nB n (h) + B n (h)A T n )C n (h) + C n (h)(A n B n (h) + B n (h)A T n )\ 
lti h=i 

and its elements have the formula 

v v l H 

y/nlnii = E ^Vna n ik—^2[b n jk(h)c nk i(h) + c nij (h)b njk (h)] 
k=i j=i lH h=l 

v v 1 H 

(2.9) ^Vna nkj —^[b nj i(h)c n ik(h) + b nk [(h)c nji (h)] 

k=l j=l h=l 

p v 

= '■ E E \^ a nlkD n ijkl- 
k=l j=l 

From the proofs of Theorems 2.1 and 2.2 in the Appendix, D n ij k [ converges 
in probability to a constant Dij k [. The well-known result of sample covari- 
ance yields the asymptotic normality of all *Jna n u. Thus, \fnl n u converges in 
distribution to N(0,Vu), where Vu = hm n ^oo var(X^ =1 Y?j=\ Vn a nik AjfcO- 
This means that I n u = O p (l/\fn) and we have the following result. 



Corollary 2.1. Under the conditions of Theorems 2.1 and 2.2, the 

results of these two theorems continue to hold when the mean and covariance 

—ill 

of x are unknown and the Zi = E x {xi — x) are used to estimate the matrix 
E(I p - E Z \ Y ) ■ 
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This corollary holds because the convergence rate of I n is faster than the 
convergence rate of A n and thus the results of Theorems 2.1 and 2.2 do not 
change. 

2.2. When is SAVE y/n consistent?. The following theorem asserts the 
asymptotic normality of the estimator in a special case in which the response 
is discrete and takes a finite value. For any value I, define E\{1) = E(z|Y = I) 
and 

d 

V(Y, z) = ]T[-2((z 2 " 2z j E 1 (l))I(y j = I) - E((z 2 - 2zE 1 (l))I(Y = I))) 
l=i 

x (I p -Cov(z\Y = l)) + (I( yj = l)- Pl ) x (/ p -Cov(z|Y = Z)) 2 ]. 

Theorem 2.3. Assume that the response Y takes d values and, without 
loss of generality, assume that Y = 1, 2, . . . , d and P(Y = I) = p\ > for 
I = 1, ... ,d. Additionally, assume that E|| z || 8 < oo. Then when H = d, 

^vech ^1 £ (/ p _ t(h)) 2 - E(J P - £ Z |y) 2 ^) N(0, Cov(vech{y(Y, z)}). 

When the Zj are used to estimate the SAVE matrix, the term y/nl n affects 
the limiting variance. Note that 



(2.10) 



1 n 

(t x - = -J2 [fa ~ E(x)f - S^]^ 1 + 0p (l/ V 

l n 

= ■ - X] ( e mlk)l<k, l<p + O p {l/y/n). 
Tl 

m=l 



The leading term is a sum of i.i.d. random variables, which implies that a n ik 
is asymptotically a sum of i.i.d. random variables. Then from (2.9), 

n / P P \ 

Vn(lnil)l<i, l<p = —/= ( X ^ e mlkDnijkl J + °p(l) 

V n m=1 \jfc=l j=l ) l<i,l<p 

(2.11) 

1 n 

= : -^E E -+° P ( 1 )- 

» m=l 

Corollary 2.2. Under the conditions of Theorem 2.3, 
^ vech ^1 £ (J p _ X^)) 2 _ E(/ p - £ Z |y) 2 ^ 
iV(0, Cov(vech{F(Y, z) + Ei}). 
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3. The approximation and bias correction. 

3.1. The approximation. Note that when Y is a discrete random vari- 
able, SAVE needs only very mild conditions to achieve asymptotic normality. 
In this case, H is a fixed number that does not depend on n. In applications, 
H is often a fixed number, which means that approximation via discretiza- 
tion is used in practice. It would be worthwhile to conduct a theoretical 
investigation to ascertain the rationale of the approximation. 

Let S h = (qh-i,qh] for h=l,...,H, q = -oo, q H = oo and p h = P(Y £ 
S/i). Recall that the construction of the slicing estimator is based on a 
weighted sum of the sample covariance matrices of the associated z i 's with 
y,j's in all slices Sh, h = 1, . . . ,H. These sample covariance matrices are the 
estimators of the E(Cov(z|y G Sh)) 7 s. Note that these matrices can be writ- 
ten as 



E(/i) :-- 



E((2 _EM f/(ye ^ 





Ph 

where /(•) is the indicator function. The estimator of ph is equal to l/H 
when qh is replaced by the empirical quantile q\. The slicing estimator can 
be rewritten as I p - $. J2h=i + jj Eh=i ^ 2 (h) 
with 

£(M = 7 2(*(fej)-*w) 5 

(3.1) 

1 n 

That is, the slicing estimator estimates A(i?) = J2h = i(I p — T,(h)) 2 ph- In the 
case in which Y is continuous and H is large, we have 

H 

A(ff) = J2 ViVp ~ Cov(z\Y)fl(Y G S h )\ 
h=l 

= E(/ P -Cov (z\Y)) 2 , 

where = stands for approximate equality. Clearly, under some regularity 
conditions, A.(H) can converge to E((J P — Cov(z|Y")) 2 ) as H — > oo. 

As with Theorem 2.3, we have the following result. Define, for every h, 
E\(h) = E(z|Y" G Sh) and take f(qj) as being the value of the density of Y 
at qj. 



Theorem 3.1. Let q~h = y(ch)> h = 1, . . . , H — 1, be the empirical {h/H)th 
quantiles, with % = and qu = oo. Assume the following: 
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(1) E||z|| 8 < oo. 

(2) If we write E(F(Y,z,a,b)) :=B{z 2 {I(Y e(a,b])-I(Y eS h )), then 
E(F(Y,z,a, b)) is differentiable with respect to a and b and its first derivative 
is bounded by a constant C\. 

(3) // we write B(G(Y,z,a, b)) := E(z(I(Y G (a, b}) - I(Y G S h ))), then 
E(G(Y, z,a,b)) is differentiable with respect to a and b. 

(4) The density function f(y) of Y is bounded away from zero at all 
quantiles qh, h = 1, . . . , H — 1. 

When A n is constructed with the slices Sh = (qh-i,Qh] , h = 1, . . . , H , as n — > 

oo, 

^ vech (l £ (J p _ t(h)f - B(I p - X^y) 2 ^ 

is asymptotically normal with zero mean and variance Cov(vech{L(Y, z)}). 
When the Z; t are used to construct the estimator, the limiting variance 
is Cov(vech{L(Y, z) + Ei}), where 

L(Y,z) = (-2 ]T ((z 2 - 2zE l {h))I(Y G S h ) - B((z 2 - 2zE 1 {h))I(Y G S h ))) 

I h=l 



2 ^ ( -I(Y < q h -i) + V !( Y < + & 



/i=l 



x (F / (^_i,%)-2G"(^_ 1 ,^)ii; 1 (/ l ))| x (J p -S(^)) 
and Ei is defined as in (2.11). 

Remark 3.1. Conditions (2)-(4) are assumed in order to ensure some 
degree of smoothness of the relevant functions, and thus the conditions are 
fairly mild. 

3.2. Bias correction. In terms of examining the expectation of A n , we 
can see that the major bias is the term ^-j-E(££ T ) 2 . If we can eliminate 
the impact of this term, then asymptotic normality may be possible. In this 
subsection, we suggest a bias correction, the idea of which is simple. We first 
obtain an estimator of this term and then subtract it from the estimator of 
A n , which motivates the bias correction as follows. 

As before, we divide the range of Y into H slices. According to the result 
of Theorem 2.2, the estimator of V =: E(££ r ) 2 is defined as 

-. H c 

V n = J^l^l^(( Z (hJ)-Z(h))(Z(h,j)-Z(h)) ) • 
h=lj=l 
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The corrected estimator of A is 

A C{C ~ 1) A C " 1 V 

(c — iy + 1 (c — i) z + 1 

Theorem 3.2. Assume that conditions (2)-(3) of Theorem 2.1 and con- 
ditions (l)-(6) of Theorem 2.2 are satisfied. Let c ~ n b , where b is a positive 
number that satisfies the following three inequalities: 

(a) b>\; 

(b) 6<0.5-max{pi,ri, 3T | 7 2,g^+r,g^}; 

(c) 6<l-max{2ri, 3T | 7 2 + 2^74, p 2 }- 

Then vech ^-(V n - V) = o p (l) and therefore ^/nvech(A n - A) = O p (l). The 
results continue to hold when the Zi 's are used to construct the estimators. 

Similarly to (2.9), the term that relates to — Y<x = O p (l/y/n) and the 
V n that is based on the z^s differs by a term that is O p {\/y/n) from the V n 
that is based on the Zi's. Thus, the estimators that are based on the Zj's 
have the same asymptotic behavior as that of the V n that are based on the 
z^s. 

To show the ^Jn consistency of the estimated CDR subspace, we define a 
bias-corrected estimator for the matrix E(/ p — Yi Z \ y ) 2 by 

2 H , 

CSAVE„ := I p - — V + A n . 
H h=i 

The eigenvectors that are associated with the largest k eigenvalues of CSAVE, 
are used to form a basis of the estimated CDR space, following result asserts 
the asymptotic normality of the corrected estimator. 

Corollary 3.1. Under the conditions of Theorem 3.2, 

V^vech(CSAVE n - E((/ p - S z , y ) 2 )) 

is asymptotically multinormal with zero mean and finite variance (A1 + A2), 
where Ai and A2 are defined in (A. 17) and (A. 19), respectively. When the 
Zi are used to construct CSAVE n , the limiting variance is (Ai + A2 + Ei), 
where Ei is the random matrix that is defined in (2.11). 

3.3. The consistency of estimated eigenvalues and eigenvectors. As the 
CDR space is estimated by the space that is spanned by the eigenvectors that 
are associated with the nonzero eigenvalues of the estimated SAVE matrix, 
we present the convergence of the estimated eigenvalues and eigenvectors. 
Because the convergence is the direct extension of the results of Zhu and 
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Ng [27] or Zhu and Fang [25], we do not give the details of the proof in this 
paper. 

From the theorems and corollary in this section, we can derive the asymp- 
totic normality of the eigenvalues and the corresponding eigenvectors by 
using perturbation theory. The following result is parallel to the result for 
SIR obtained by Zhu and Fang [25] and Zhu and Ng [27]. The proof is also 
almost identical to that for the SIR matrix estimator. We omit the details 
of the proof in this article. 

Let Ai(A) > X 2 (A) >■ ■ ■> X P {A) > and h{A) = (b u {A), . . . , b pi {A)) T , 
i = 1, . . . ,p, denote the eigenvalues and their corresponding eigenvectors for 
a p x p matrix A. Let A = E(/ p — T, z ^y) 2 and A n be the estimator that is 
defined in the theorems and corollary of Section 3. 

Theorem 3.3. In addition to the conditions of the respective theorems 
in this section, assume that the nonzero A/ (A) 's are distinct. Then for each 
nonzero eigenvalue Aj(A) and the corresponding eigenvector bi(A), we have 

v^(Ai(A n )-Ai(A)) 

(3.2) = V^(A) T (A„ - A)fei(A) + o p (v^||An - A||) 
= b l (A) T Wb i (A), 

where W is the limit matrix of i/n(CSAVE n — E((/ p — £ Z |y) 2 )) that is 
studied in Corollary 3.1, and as n — > oo, 

^{bi{A n ) - h{K)) 

r- MA)MA) T (An-A)^(A) 

(3.3) =^n 2^ — T — T + o p (v / n||A n - A||) 

l=l,l^i Xj(A) — A/ (A J 

A b^AMAfWbijA) 
4 ^(A)-AKA) ' 

where \\ A n - A\\ = J2i<i,j< P \<Hj\ ■ 

4. Simulation study and applications. In this section, a simulation study 
is carried out to provide evidence for the efficiency of SIR, SAVE and the 
bias-corrected SAVE in practice. Following Li [16], the correlation coefficient 
between two spaces is taken to be the measure of the distance between the 
estimated CDR space and the true CDR space S y \ z . For any eigenvector /?i 
that is associated with one of the largest k eigenvalues obtained by the esti- 
mate, the squared multiple correlation coefficient B?{fii) between pf ' z and 
the ideally reduced variables (3f z, . . . , /?J z of S y \ z is employed to measure 
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the distance between f3\ and the space S y \ z . That is, 

C^zP? 



R z {f3 l )= max 



As z is a standardized variable, R 2 ($±) actually has the simpler formula 

R 2 ((3i) = max 0Tp) 2 . 

When the estimated CDR space has dimension k, for a collection of the 
k eigenvectors i = 1, . . . , k, that are associated with the k largest eigen- 
values, we use the squared trace correlation [the average of the squared 
canonical correlation coefficients between (3j z, . . . ,/3j z and 0[z, . . . , ft[z 
as denoted by R 2 {B)] as our criterion (see also [13]), where B is the space 
that is spanned by {j3\, . . . , 

We consider the cases where k = 1 and n = 200 and 480 and choose the 
following five models: 

Model 1: y = ((3 T z) 3 + e. 

Model 2: y = ((5 T z) 2 + e. 

Model 3: y = (3 T z x e. 

Model 4: y = (f3 T z) 3 + {(3 T z) x e. 

Model 5: y = cos(/3 T z) + e. 

In these models, the covariate z and the error e are independent and 
respectively follow the normal distributions iV(0,iio) and iV(0, 1), where 
Iio is the 10 x 10 identity matrix. In performing the simulation, we set 
/3=(1,0,...,0). 

We select models 1 to 5 based on the following considerations. Model 1 
favors SIR rather than SAVE because the regression functions are strictly 
increasing. A similar investigation was undertaken in [28]. Model 2 favors 
SAVE rather than SIR because the inverse regression function is a zero func- 
tion and then dim(SE(z\y)) = where dim(S) stands for the dimension of the 
space S. Model 3 deals with the variance function. Model 4 is constructed to 
be a combination of Model 1 and Model 3, as we are curious about the per- 
formance of SIR and SAVE in relation to the mean function and the variance 
function. We also include Model 5, which involves a periodic function. 

The results are reported in Figure 1 and Table 1. When n = 200, a simu- 
lation was conducted with H = 2, 5, 10, 20 and 50, but we only report the 
results with H = 10 for illustration because for practical use, H = 10 is a 
good choice for this sample size (see relevant references such as [5, 16, 28]). 
The sensitivity to the slice selection will be discussed in terms of the results 
that are reported in Table 1 with n = 480. The boxplots in Figure 1 show 
the distribution of R 2 for a total of 200 Monte Carlo samples and show how 
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the bias correction works with a fairly small sample size. From Figure 1, it 
is clear that CSAVE works well and is robust against the models that we 
employ. 

Table 1 displays the numerical results for n = 480. The median of R 2 from 
a total of 200 Monte Carlo samples is presented so that we can compare the 
efficiency of the methods. To check the impact of the number of slices H, 
the values 2, 6, 24 and 96 are considered. 

As expected, SIR is insensitive to c, but sensitive to the model and does 
not work well when the regression function is even or the CDR space is 
related to the error term. 

The performance of SAVE is strongly affected by the choice of c, but 
when H is properly chosen, SAVE works very well. However, the range of c 
that results in a good performance from SAVE is fairly narrow. From the 
simulation results, we can see that when H = 96, that is, when c = 5, SAVE 
does not perform well. This is consistent with the theoretical conclusions in 
Section 2. The simulations show that choosing a relatively small H favors 
SAVE, but that CSAVE still outperforms SAVE. Specifically, for H = 2, 6, 




SAVE, SIR, CSAVE 
Model 1 



SAVE, SIR, CSAVE 
Model 2 



SAVE, SIR, CSAVE 
Model 3 




SAVE, SIR, CSAVE 
Model 4 



SAVE, SIR, CSAVE 
Model 5 



Fig. 1. Boxplots of the distribution of 200 replicates of the R 2 values for models 1-5 when 
H = W andn = 200. The boxplots are, from left to right, for SAVE, SIR and CSAVE. 
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Table 1 

The empirical median of the R 2 with n = 480 



R 2 0) 





H = 2 


H = 6 


H = 24 


H = 96 


IVl until 1 










SAVE 


0.7521 


0.9599 


0.0099 


0.0009 


SIR 


0.9442 


0.9681 


0.9714 


0.9586 


CSAVE 


0.8023 


0.9687 


0.9539 


0.0122 


Model 2 










SAVE 


0.9539 


0.9523 


0.9187 


0.7225 


SIR 


0.0460 


0.0386 


0.0443 


0.0435 


CSAVE 


0.9575 


0.9584 


0.9317 


0.8487 


Model 3 










SAVE 


0.0724 


0.9201 


0.8517 


0.3547 


SIR 


0.0586 


0.0545 


0.0564 


0.0448 


CSAVE 


0.0654 


0.9336 


0.8854 


0.6393 


Model 4 










SAVE 


0.0741 


0.9055 


0.8665 


0.3059 


SIR 


0.8656 


0.8952 


0.8825 


0.7263 


CSAVE 


0.1066 


0.9277 


0.9024 


0.7097 


Model 5 










SAVE 


0.8750 


0.8657 


0.6741 


0.1249 


SIR 


0.0581 


0.0484 


0.0558 


0.0625 


CSAVE 


0.8851 


0.8966 


0.7639 


0.2517 



24 and 96, the R 2 of CSAVE is larger than that of SAVE, especially when H 
is large. Although the performance of CSAVE is also influenced by the choice 
of c, the range of c that makes CSAVE work well is larger than that which 
makes SAVE work well. As, to some extent, CSAVE removes uncertainties 
about which c should be used in practice, we recommend this method. Based 
on the limited simulations, H = n/20 is recommended for practical use. 

APPENDIX 

As the proofs are rather tedious, in this section we only present outlines; 
readers can refer to Li and Zhu [18] for the details. 

A.l. Proofs of the theorems in Section 2. 

PROOF of Theorem 2.1. We first write out the formula for J n . From 
definition (2.1), we have 

c l-l 

H C -V 1=2 j=l 
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For every z, we have z = m(y) + e. Thus, for any pair I and j, 

(z(h,i) - z (h,j)) 2 

= (My(h,i)) - ™(y(h,j))f + ( m (y(h,i)) - ™(y(h,j)))(£(h,i) - e {h ,j)) T 

+ ~ £ (hJ) )( m (y(h,i)) - m(y (hJ) )) T + (e (AjI ) - £{ h) j)f 

= : S^h, l,j) + S 2 (/i, l,j) + S 3 (/i, i, j) + S 4 (h, l,j). 
Further, A n can be written as 

A = Eti[EUE l fJi(Si(hjj)+s 2 (hjj)+s 3 (hjj)+s i (hjj))] 2 

nc(c — l) 2 

For the sake of notational simplicity, we let 

(A.i) c„(t , *,) = x _ EEEEE s ^< i.i) s *0»> v ' «)■ 

rtcic i; ft=1 ;=2 j=1 v=2 u=1 

Then A n = Yn=i Efe=i Cn(«, A;). Note that ^ n = C n (4,4) and thus J n = 
A n — C n (4, 4). To show that n^J„ = o p (l), we only need to show that under 
the conditions of Theorem 2.1, for any pair (i,k), except when % = k = 4, 
n^C n (i,k) converges to in probability as n — ► oo. Without loss of gener- 
ality, we only consider the upper-left most element of C n (i,k), as the other 
elements can be handled similarly. Without confusion, we can still use the 
same notation for this element as the associated matrix C n (i,k). Therefore, 
in the following proof, C n (i,k) is real-valued. 

For each q such that < q < |, divide the outer summation over h into 
three summations — from 1 to [Hq], [Hq] + 1 to [H(l — q)] and [H(l — q)] + l 
to H — to obtain 

C n (i, k) = C\ n (i, k) + C 2n {h k) + C 3n (i, k). 
For C2n(hk), we have 

[H(l-q)] c l-l c v-1 

|c 2 n^fc)|< £ ££££||s*(mj)H|s*(M,«)II, 

nC[ - C L > h=[Hq]+l 1=1 j=l v=2 u=l 

where ||S|| denotes the maximum absolute value among elements in S. For 
||Sj(/i, I, j)\\ ■ \\Sk(h, v,u)\\, we note that when h G [[Hq] + 1, [H(l — q)]], there 
is a compact set [—B(q),B(q)] such that in probability, both yn nq \+u and 
y([n(i-q)]) belong to that set. As m(y) is bounded on any compact set, there 
exists a Q > such that in probability, || m(j/(h,j) ) II < Q- Let er n \ and em 
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denote the largest and the smallest of all £(j)'s, respectively. When i and k 
are fixed, we can determine s such that 

EEEEii s *(^^')ii-ils*(fc,«,«)ii 

1=2 j=l v=2 u=l 



s M.-l)|lW-t W l - ti:(2Qr'||.nfa (t ,, ) ) - mfe(kJ) )| 

Z i=2 j=l 

+ Op(l). 

As i and A; cannot equal 4 simultaneously, we have 1 < s < 4 and hence, 
i_2 ||£(n) - e(i)l| 4_s Q s_ Vcsup n7i(B(?)) Epi 1 l|m(y (j+1) ) - m(y (j 



< 



n 

+ Op(l) 

=:C^ n ( S ) + 0p (l). 

_ i 

Using Lemma 1 of [14], we have n 8 + Q ||e(i) — II = °pW- Condition (2) 
of Theorem 2.1 implies that lim^co n~ r sup n „(B( 9 )) E™=1 ll m (y(i+i)) — m (?/(i)) II 
0. As s > 1, C£ n (s) = Op(n r+ 8t^ +6_1 ) and therefore when ^ + 6 + r + ^ < 1, 
n^C' 2n {s) — ► 0. We now consider C\ n (i,k) and C3 n (z,fc). If y is not bounded, 
we choose a sufficiently small g so that -P(y([ n (i-<j)]) > -So) — > 1 as n — > oo, 
where -Bo is given by condition (3) of Theorem 2.1. Using the nonexpansive 
property of M(y), we can prove that 

3 ||— - ||4— s 

C 3n (i,k) < P c H £ H 2 - £ (i)ll ]|M(y (ra) )-M(y ([ra(1 _ g)]) )]| s J(i/ ([n(1 _ g)1) >Bo) 

+ o p (l) 
=:C 3n ( S ) + o p (l). 

By condition (3) and Lemma 1 of [14], it can be shown that when (3 + b + 
< 1, n^C 3n (s) = o p (l). The reasoning is similar for C\ n (i, k), but we 
omit the details. The proof is thus complete. □ 

Proof of Theorem 2.2. The conditioning method is used to prove 
Theorem 2.2 and the other theorems. Denote T n = cr{yi, ■ ■ ■ ,y n }- To com- 
pute E(^4 n ), we first compute the conditional expectation of A n given y^s 
as follows, where A„ is defined in Section 2.1: 



E(A»|^n) 
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h h 

He c ^ 



(A.2) + EE E - (i + T^a )E(e( W )ef fcl ,)|J r n)E(e (Me f M |^ B ) 
h=i 1=1 „=i(„^) nc v v c x w 

n=l (=1 d=1(i;^:Z) v ' 

=: E(A ln |.F n ) + E(A 2n |.F n ) + E(A 3n |.F n ). 

As the £(i)'s are conditionally independent when the i/j are given, E( J 4i n |J r n ) 
is equal to ^ Y^j=i ^{( £ j £ J) 2 \Uj)- This is a sum of i.i.d. random variables 
and therefore E(Ai n ) = ^E[(ee T ) 2 ]. For E(A 2ri |jF n ), the conditional inde- 
pendence property and the definition mi(y) = E(ee T |y) together yield that 

E(A 2n |jF n ) 

(c — l) 2 + 1 ^ c c x 



(£-2) 



-E(A 21n |^ n )+E(^ 22n |J : n ). 

(c-2) \ V n 

c(c-i) / 



As E(A 21n |jF n ) = 1(1 - E"=i mjfef, we have that E(A 21n ) = (1 



For E(^4 22n |^ r n ), the conclusion is 

(A.3) E(^ 22n |^ n ) = o p (cn~ 1+max{ri ^ } ). 

The lines of the proof essentially follow those of the proof of Theorem 2.1. For 
each q± such that < q\ < ^ , we divide the outer summation over h into three 
summations: from 1 to [Hqi], [Hqi] + 1 to [H(l — qi)] and [H(l — q\)] + 1 
to H. Hence, E(A 22n |.F n ) = Z?i n + D 2n + £> 3n . Note that when h £ [[Hqi] + 
1, [H(l — qx)]], there exists a constant Qi such that ||mi (2/(/x,z) ) II — Qi f° r an 
1 < I < c. Thus, as mi(y) has total variation of order r\, 

Qi((c-l) 2 + l)p 3 sup nn(i j (gi)) y:™ =1 ||mi(y (i+1) )-mi(y w )|| 

^271 < 7 TT l"<WJ 

n(c — 1) 



o(cn 



-l+rv 
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If y is not bounded, then we choose a sufficiently small q\ so that P(yn n (i~ qi )]) > 
B' ) — ► 1 as n — > oo, where -Bq is given by condition (3) of Theorem 2.2. Sim- 

l , 2 

ilarly, L>3„ = o p (cn 4+a / 2 ). The proof is similar to that for Z?i n and (A. 3) 
then holds. By condition (5) and Lemma 4.11 of [15], we have 



(A.4) 



/ — l+max{ri, , 2 /o ,0i}> 
o(cn '4+a/2>f J ^ 



E(^22n) 

The proof of E(^43 n |J r n ) of (A. 2) is very similar to the one just given 

and we can thus obtain E(A 3n ) = o(c- 1 n" 1+max{ri ' 3 +^' pl} ). Hence, (2.3) is 
proved. 

We now turn to the proof of the second conclusion, (2.4), that 7i^(A n — 
A) = o p (l). Without loss of generality, consider the upper-rightmost element 
of n^(A n — A). Without confusion, we can still use the notation n^{A n — A) 
to represent this element. Note that ^{An — A} = n /3 {A n — E(A n |jF n ) + 
E(A n |.F n ) — A}. From the proof of (2.3), we can obtain that when (5 < b and 
/3<l-b- max{n, j^j^}, 

(A.5) n^{E(^ n |^ n )-A}=o p (l). 

Therefore, it remains to show that n^{A n — E(A n | J- n )} = o p {\) and it suffices 
to demonstrate the convergence of its second moment. That is, as n — > oo, 

(A.6) n 2 ^E[({(,4 n - E(A n \F n ))}) 2 ] - 0. 

Invoking (A. 2), the definition of A n given in Section 2.1, and rearranging 
the terms, we see that 



(,4 n -E(A n |JF re )) 

H 



7 E E £ \h,i) £ \h,v) 

U 1=1 v=l(v^l) 



+ 



+ 



-J2 E ( E ( e (M)l^0))( E ( e (Ml^(M)) 



1=1 v =l( v ^l) 



1 c 



i=i 



c(c-l) 



^ c c c c 

_ i\2 E E E E 



'=1 J"=10¥0 " =1 u=1(u^v) 



£ (h,l) £ Jh,j) £ (h,v)£jh,u) 



c(c 



-ni E E ( E (4M)ly(M)))( E (4^(M)) 

' Z=l u=l(t;^) 
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c(c ■ 



T) EE E 

I \ 1=1 v=l u=l(u^v) 



2 T 
£ (h,l) £ (h,v) £ (h,u) 



+ E E E £ (M) e (M £ (M) 



H 



■ ~ W + + + V 3 (h)}. 



n 



1 1=1 



We again use the conditioning method to show that J2h=i EVj 2 (/i) = 
o(l) for i = 0, 1, 2 and 3 and then use the inequality 2\Vi(h)Vj(h)\ < V?(h) + 
V?(h) to obtain that the intersection terms converge to zero from the con- 
vergence of E(V?(h)). The proof of Theorem 2.2 can then be completed. We 
now proceed to the first step as follows. 

To simplify the notation, we write, for any integer I > 1, E l (e s \y) = 
E l - 1 (s s \y)E(e s \y), where 1 < s < 6. By means of elementary calculation, 
we obtain the result 

n — J2 E(V?(ft)) = O ( ^Es 8 - IL^(E Vl*)) J = o(l). 



h=\ 



^Ef=iE(F 2 2 (M) can be bounded by 

( 



E(E (e |y)) + — _E(E 3 (s 3 |y)) + __E(E 3 (e 4 |y)) 



f -E(E 3 (e 2 |y)) + _ -EE 2 (e 4 |y) . 



nc* 



Since E(e 12 ) < oo, it is o p (l). Similarly, we have ^£ J2h=i E(V 3 2 (/i)) = o p (l). 

Using the conditioning method, we can also prove that the sum that 
relates to E(F 2 (/i)) converges to zero. First, we have 



B(V \h)\T) 



E E e (4m)I^) e ( £ (mI^)) 



3E E E 2 (s 2 M) | 2/(hj0 )E 2 ( £ 2 /lJ . ) | 2/( , j) ) 



+ 



4c 



,2 c 



E(E(4, ) b (ft)0 )E 2 (4, ) |^, )-E 4 ( £ L |y (M) )) 



i=i 



l<tyj^v<c 



+ 
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where 



l<l^j^u<c J 

=: F o(/i) - Vbi(/i) + V 02 (h) - V 03 (h) - V 0i (h) 
+ V 05 (h) + V 06 (h), 

u h,l,j,v = ni2(j/(h,o)(mi(y( h) j)) - mi(y (M) ))mi(y (/M) ), 
«fc,lj> = m 2(y(ft,o)mi(y (M ))(mi(y (ft) Q) - mi(y (M )), 
u h,lj> = m i(i/(h,i))( m i(y(h,I)) ~ m i(f(M))) m i(f(A.0)' 

u tij> = m i(y(/ l ,;)) m i(y(M))( m i(^,o) - m i(f(M))- 

We now prove that when c ~ n b and 2/3+max{2ri, 2 +q/4 + 4+0/2 > P2} + b < 2, 

all of the terms Y^M=i E(Voi(/i)) tend to 0. Using the conditioning method 
and the inequality 

E ( E th,i)\y(h,i)M4h,j)\y(hj)) < 2( E2 ( £ (M)ly(M)) + E2 ( £ (Ml y (M))> 

we have 

— £ EU00W = O E(E 2 (s|y)) = o(l). 

n z £^ \ nc / 

Similar arguments can be used to obtain J2h=i E(Vbi(&)) = o(l). 

As Vo2(/i) is a sum of i.i.d. random variables, invoking the conditions of 
Theorem 2.2, the fact that [3 < 0.5 and the law of large numbers, we can 
show that Ef=i V02 (/*) = o(l). 

The proof of the sum of Vo3(h) is similar to that of E(A22n\J r n) ■ We 
choose < q2 < 1 and divide the summation of h into three parts: [1, [-£^2]], 
[[Hq 2 ] + 1, [H(l - q 2 )\] and [[If (1 - </ 2 )] + 1, H}. The sums of the conditional 
expectation of E(Vo3(/i)| J- n ) over h in these three intervals are analyzed 
and ^ ^h=[Hq2]\-i ^(Vo3{h)) can be proved to be asymptotically zero. The 
proof is very similar to that of (A. 3) and thus we omit the details in this 
paper. The proof of (2.4) is thus complete. 

This completes proof of Theorem 2.2. □ 



Proof of Theorem 2.3. The proof is similar to that of Theorem 3.1 
below, and thus we omit the details. □ 
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A. 2. Proofs of the theorems in Section 3. 



Proof of Theorem 3.1. Our goal is to determine the asymptotic 
behavior of jjJ2h=i(Ip ~ ^(h)) 2 , where is defined in (3.1) and Sh = 

(2/(c(/i-i)))2/(cft)]- ^ suffices to show that for any p{p + l)/2 vector a, 

a T vech{-jjJ2h=i(Ip ~ ^{h)) 2 } 1S asymptotically univariate normal. Again, 
for the sake of notational simplicity, we consider the univariate case. Clearly, 
cjh = V(ch) , h = 1, . . . , H, are the empirical quantiles that converge to the pop- 
ulation quantiles qh in probability, where P(Y < q^j = h/ H . If we can verify 
the asymptotic normality of — for h = 1, . . . ,H, then the asymp- 
totic normality of A n can be obtained through the decomposition 

M 1 E (ip - sw) 2 - E (& - s ( /i )) 2 ) 

V h=i h=i ) 

— H 

n 



(A.7) = E (t(h) - E(h))(2I p - t(h) - E(fr)) 

h =i 

= Z ^ E - E(fc))(J, - E(h)) + o p (l). 

h =i 

We now study From (3.1), 

1 n ( 1 n - \ 2 

(A.8) 

= S 1 (/ l )-(^ 1 (/ l )) 2 . 
Next, we calculate y / n(Si(/i) — £i(/i)). Note that ph = Ph = 1 / H and thus 

1 n 

1 n 

(A.9) + E e 5,) - Jfo e 5,)) 

=:S u (fc) + S12W. 

Clearly, £n(/i) is asymptotically normal because it is a sum of i.i.d. random 
variables. 

For Si2(/i), we first introduce the notation F(Y, z, a, 6) = z 2 (I(y £ (a, &]) — 
/(y £ 5ft)) for any pair (a, 6). Note that qh~ 1h = O p {l/y/n). Invoking The- 
orem 1 of Zhu and Ng [27] or the argument used in Stute and Zhu [22] and 
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Stute, Thies and Zhu [21], we can show that 



1 n 

-^^Y^ F ^y^ z 3^h-iAh)-^{F{x,zAh-xAh))) 



o P (l) 



Together with (A. 10), the continuity of E(F(Y } z,qh-i,qh)) at qh-i and q^, 
the \fn consistency of qt and Taylor expansion give 

t 12 (h) = H^E{F{Y,z,q h ~iAh)) + o P {l) 

= Hy/n(q h _i - q h -iAh ~ qh)F'(q h -i, qh) + o p (l) 



H " f -Ijyj < gh _x) + ^ -Ijyj < g h ) ±± \p, { , 
— 7? \ ' Tt — \ ) t \Qh-i,Qh) 



(A.10) 



3=1 
+ Op(l), 

where F' is the derivative of E(F(Y,z,a, b)) with respect to (a, 6). The 
asymptotic normality can be shown to hold by using well-known results on 
the empirical quantiles qt (see [20]). 

For (Ei(h)) 2 from (A. 8), the foregoing argument can be applied to obtain 
^(Ei{h)) 2 , giving 

M(Fi(h)f ~ (E 1 (h)f) 

= 2yfc{E x {h) - E 1 (h))E 1 (h) + o p (l) 

2F 

j= 



2_f3 

(a.ii) = ^ E e Sh) - E(zi(y g s h )))^i(/i) 

V n ,-=l 



xG / ( % _i,%) J Ei(/ l )+o p (l), 

where G'(o,6) is the derivative of E(G(F,z,a,&)) :=B(z(I(Y G (a, 6]) 
J(y G Sfc))) with respect to (a, b). Together with (A.8)-(A.12), we have 



v^f^E & - ^)) 2 - 4 E ( J p - E W) 2 ) 



^E " 2 E - IzjExihmvj G 5,) 



n . 

3=1 *> h=X 
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'2 



-V((z z -2zE 1 (h))I(YeS h ))) 

f -i(y j <g h ~i) + ! ^ L i{y 3 <g h ) + ] k 

fbh-i) ' f(Qh) 

x (F'(q h - 1 ,q h )-2G / (q h _ 1 ,q h )E 1 (h))\ x (/ p -S(/i)) 



+ Op(l) 

1 n 

:= -= ^ L( % -, z,-) + 0p (l) =}► iV(0, A'), 
Vn i=i 

where A' = Cov(L(Y,z)). □ 



Proof of Theorem 3.2. We only present the proof for the univariate 
case. As c — > oo , it is equivalent to showing that when c satisfies the required 
conditions, 

H 1 c 

o p (l). 

r- ' 



(A.12) ^ E \ £(*(M - , W ) 4 - E(s 4 )) 



Some elementary calculation yields 
I h 1 c 

7?E :E( Z (M -%)) 4 
71 h=i j=i 

l H l c 

= tjE -E e (ftj) 



4e (M ^ id ^3 , 1 / 



c 3 



Rnl + Rn2, 



where A {h) = ^=1 and B (M = ES=i( m (2/(M)) - m(y (fej) )). Rear- 

ranging the summands in R nl , we can easily show that \fn\Rn\ — E(e 4 )] = 
^Ej=i(4-E(e 4 )) follows the distribution JV(0, var(e 4 )) and thus ^[R n i- 
E(e 4 )] = o p (l). Hence, to prove (A.12), we only need to show that 



^R n2 =o p (l). 
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We find that the terms in ^R n 2 have the following two common formats. 
For 1 < st < 4, 

and for 1 < s' < 4 and < s < 4 - s', 

Therefore, our task is to prove that they are all o p (l). For K(si) , s, we 
need only show that their second moments asymptotically converge to 0, 
the main idea of which is to use the conditioning method to compute their 
conditional expectations given y^s and to use a sum of i.i.d. random variables 
to approximate the K(s\) , s. The arguments are very similar to those in the 
proof of Theorem 2.1 and the details can be found in [18]. 

For W(s,s') of (A. 16), we note that if we let d = maxi<j< n (|ej|), then 

1^1 <d and thus 

I c I — 

h = l j = l 

For each q such that < q < | , we divide the outer summation over h into 
three summations — from 1 to [Hq], [Hq] + 1 to [H{1 — q)] and [H(l —q)] + l 
to if— which allows us to write W(s,s') = Wi(s,s') + W 2 (s,s') + W 3 {s,s'). 
We then use the argument that was used to prove Theorem 2.1 to show 
that W(s, s') = o p (l). (A. 14) is thus proved and the proof of Theorem 3.2 is 
complete. □ 

Proof of Corollary 3.1. We want to show that for any pip + l)/2 
vector a, a T vech{CSAVE n — A} is asymptotically univariate normal with 
zero mean and finite variance. Denote 

{/ -|\ c e 1 c 

(c _ c 1)2 + 1 E E(4m) £ (m) - cA - - X>(m - ~ e ih)f 

- A E E ((^,o - ^)) 2 - 2E (^i,)) } • 

1=2 j=l ) 

To prove the asymptotic normality, we will check the four conditions with 
the conditional central limit theorem (CCLT) that was provided by Hsing 
and Carroll [14], Theorem A.4. From Theorem 3.2, ^/na T vech{CSAVE n - 
E(ip - ^z\ y ) 2 } is asymptotically equivalent to -4= J2h=i z nh- AsZ nl ,..., Z nH 
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are conditionally independent given J- n , condition (1) of the CCLT is satis- 
fied. 

To check conditions (2)-(4) of the CCLT, the calculation is very similar 
to that in the proofs of Theorem 2.2 and Theorem 3.2. For the conditional 
expectation of Z n h, we have 

1 H 



U h=l 



(A.17) =4=E a T vech{m?(y (i) ) - A - 2(m 1 (y {j) ) - E(E Z |„))} + o p (l) 



n * 



A f (0,a T Aia) 



»d - 

where Ai = var(vech{mf (y(j)) — A — 2(mi(y(j)) — E(S Z | 3/ ))}), and hence 
condition (4) of the CCLT is satisfied. For condition (2), we only need to 
note that, together with conditional independence, 



H 

n 



E{(Z n ^ - E(Z n ? i |J : " n )) 2 |J' : " ?t } 

h=i 



1 ™ 

- H aTvec M( m 2(2/(j)) - mf (y(,-)))m? (%■))}{ 



n . 

3=1 



4 n 

+ a T vech{m 2 (y (j) ) - m?(yy))}a 
n i=i 
4 n 

(A.18) - -^a T vech{(m 2 (y (j) ) - m 2 l (y (j) ))m 1 (y {j) )}a + o p {l) 

5=1 

= a T vech{E[(m 2 (y) - m?(y))m?(y) + 4(m 2 (y) - mf (y)) 

-4(m 2 (y) -m?(y))mi(y)]}a + o p (l) 
=:a T A 2 a + o p (l). 

Condition (3) of the CCLT can be checked using a similar argument. The 
main idea is as follows. Invoking the conditional independence of the Z n hS 
and the existence of the 12th moment, we can use a method similar to 
that which was used to prove Liapounoff's central limit theorem (see, e.g., 
Pollard [191) to verify condition (3) of the CCLT. Hence, the CCLT implies 
that J2h=i Z n h is asymptotically normal with zero mean and variance 
a r (Ai + A 2 )a. 

When the Zj's are used to construct the statistic, as with the proofs of 
the other theorems, the asymptotic normality holds with limiting variance 
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a T (Ai + A2 + Ex)a, where Ei is the random matrix defined in (2.11). The 
proof is thus complete. □ 
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